American Journal of Epidemiology
◐ Oxford University Press (OUP)
Preprints posted in the last 30 days, ranked by how well they match American Journal of Epidemiology's content profile, based on 57 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Li, K.; Hou, Y.; Mukherjee, B.; Pitzer, V. E.; Weinberger, D. M.
Show abstract
Household transmission studies are important for understanding infectious disease transmission and evaluating interventions; however, they are frequently constrained by methodological challenges, including in study design and sample size determination, and in estimating parameters of interest after collecting the data. Existing tools often lack flexibility in modeling age-specific susceptibility, infectivity patterns, and the impact of interventions such as vaccination or prophylaxis. Here, we develop HHBayes, an open-source R package that provides a unified framework for simulating and analyzing household transmission data using Bayesian methods. The package enables researchers to: (1) simulate realistic household transmission dynamics with highly customizable variables; (2) incorporate viral load data (measured in viral copies/mL or cycle threshold values) to model time-varying infectiousness; (3) estimate age-dependent susceptibility and infectivity parameters using Hamiltonian Monte Carlo methods implemented in Stan; and (4) evaluate intervention effects through user-defined covariates that modify susceptibility or infectivity. We demonstrate the capabilities of the package through simulation studies showing accurate parameter recovery and applications to seasonal respiratory virus transmission, including the impact of vaccination and antiviral prophylaxis on household attack rates. HHBayes addresses a critical gap in infectious disease epidemiology by providing researchers with accessible tools for both prospective study design and retrospective data analysis. The flexibility of the package in handling complex household structures, time-varying infectiousness, and intervention effects makes it valuable for studying diverse pathogens.
Echeverria, S.; Seo, Y.; Borrell, L. N.; McKelvey, D.; Najjar, T.; Reifsteck, E. J.; Erausquin, J. T.; Maher, J. P.
Show abstract
Background Physical activity (PA) and body mass index (BMI) shape cardiovascular risk, particularly in women. Yet, little research exists examining intersectional social axes shaping PA and BMI inequities among women living in the United States (US). Methods Data included women sampled in the 2015-2020 National Health and Nutrition Examination Survey. We used Intersectional Multilevel Analysis of Individual Heterogeneity and Discriminatory Accuracy (I-MAIHDA) via linear models to examine PA (n=,4591) and BMI (n=4,596) inequities across intersectional strata defined by race/ethnicity, age, education, nativity, and work status. We further quantified the contribution of these strata to the observed inequities and estimated additive fixed effects. Results In the null model, intersectional strata explained 4.6% and 13.8% of the variance in PA and BMI inequities, respectively, with 99.2% for PA and 97.5% for BMI explained by age, race/ethnicity, education, nativity, and occupation status. On average, Asian and Black women, those aged 35-49 years, those born outside the US, and those with less than a high school diploma had the lowest predicted mean PA. For BMI, Black and Hispanic/Latino women and those younger than 64 years had the highest mean BMI. Conclusion PA and BMI inequities are mostly explained by race/ethnicity, age, education, nativity, and work status. Our findings offer insights into universal and potential policy-informed health promotion strategies that may be tailored to women with these social identities and lived experiences that have shaped physical activity and body mass index inequities.
Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.
Show abstract
Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.
Mwakazanga, D. K.; daka, v.; Gwasupika, J. K.; Dombola, A. K.; Kapungu, K. K.; Khondowe, S.; Chongwe, G. K.; Fwemba, I.; Ogundimu, E.
Show abstract
Medical male circumcision (MMC) is an established HIV prevention intervention, yet concerns persist that circumcised men may adopt higher-risk sexual behaviours following the procedure. Evidence from observational studies has been inconsistent, partly because many analyses do not adequately distinguish behaviours that occur before circumcision from those that occur afterward. This study assessed the association between MMC and subsequent sexual behaviours while demonstrating how population-based cross-sectional survey data can be adapted to address this temporal challenge. We analysed nationally representative data from the 2024 Zambia Demographic and Health Survey (ZDHS), including men aged 15 - 59 years who reported their circumcision status. Men who had undergone medical circumcision were compared with uncircumcised men using a matched pseudo-cohort framework that reconstructed temporal ordering based on age at circumcision. Propensity score overlap weighting was applied to improve comparability between circumcised and uncircumcised men, and odds ratios were estimated using logistic regression models incorporating overlap weights and accounting for the complex survey design. Sexual behaviour outcomes occurring after circumcision included condom non-use at last sexual intercourse, multiple sexual partners in the past 12 months, self-reported sexually transmitted infection (STI) symptoms, and composite measures of sexual risk behaviour. The analysis included 9,609 men, of whom 33.3% were medically circumcised. MMC was associated with lower odds of condom non-use at last sexual intercourse (adjusted odds ratio [aOR] = 0.75, 95% confidence interval [CI]: 0.67 - 0.85) and lower odds of reporting any sexual risk behaviour (aOR = 0.83, 95% CI: 0.72 - 0.95). No meaningful associations were observed between MMC and reporting multiple sexual partners, self-reported STI symptoms, or higher levels of composite sexual risk behaviour. In this population-based study, MMC was not associated with sexual risk compensation under routine programme conditions within the overlap population defined by the weighting scheme, supporting the behavioural safety of MMC and illustrating the value of explicitly addressing temporality when analysing behavioural outcomes using cross-sectional survey data.
Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.
Show abstract
Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.
Goncalves, B. P.; Franco, E. L.
Show abstract
Timeliness of therapy initiation is a fundamental determinant of outcomes for many medical conditions, most importantly, cancer. Yet, existing inefficiencies in healthcare systems mean that delays between diagnosis and treatment frequently adversely affect the clinical outcome for cancer patients. Although estimates of effects of lag time to therapy would be informative to policymakers considering resource allocation to minimize delays in oncology, causal methods are seldom explicitly discussed in epidemiologic analyses of these lag times. Here, we propose causal estimands for such studies, and outline the protocol of a target trial that could be emulated with observational data on lag times. To illustrate the application of this approach, we simulate studies of lag time to treatment under two scenarios: one in which indication bias (Waiting Time Paradox) is present and another in which it is absent. Although our discussion focuses on oncologic outcomes, components of the proposed target trial could be adapted to study delays for other medical conditions. We believe that the clarity with which causal questions are posed under the target trial emulation framework would lead to improved quantification of the effects of lag times in oncology, and hence to better informed policy decisions.
Abrishamcar, S.; Eick, S. M.; Everson, T.; Suglia, S. F.; Fallin, M. D.; Wright, R. O.; Andra, S. S.; Chovatiya, J.; Jagani, R.; Barr, D. B.; Lussier, A. A.; Dunn, E. C.; MacIsaac, J. L.; Dever, K.; Kobor, M. S.; Hoffman, N.; Koen, N.; Zar, H. J.; Stein, D. J.; Hüls, A.
Show abstract
Background Prenatal exposure to pesticides and psychosocial factors often co-occurs, particularly in low- and middle-income settings, yet their joint effects on epigenetic age acceleration (EAA) in early life remain unknown. We investigated the joint associations of prenatal pesticides metabolites and psychosocial factors on EAA in the first five years of life in the South African Drakenstein Child Health Study. Methods In 643 mothers, we measured 11 urinary pesticide metabolites and seven psychosocial factors during the second trimester of pregnancy. Child DNA methylation was measured in whole blood at ages 1, 3, and 5 years. EAA was estimated using the Horvath, Skin & Blood Horvath (skinHorvath), and Wu epigenetic clocks. Longitudinal associations were estimated using generalized estimating equations, adjusted for confounders. Joint mixture associations were evaluated using weighted quantile sum regression (WQS) and quantile g-computation (QGCOMP). Results The joint prenatal exposure mixture was positively associated with Wu ({beta} per one quintile increase in the mixture [95% CI]: 0.41 years [0.15, 0.80]), skinHorvath (0.11 years [0.06, 0.16]), and Horvath EAA (0.31 years [0.20, 0.46]) over time using WQS. Psychosocial factors, particularly food insecurity, physical interpersonal violence, and stress biomarkers, contributed most to the total mixture effect for all clocks. Pyrethroid metabolites PBA and TDCCA were top pesticide contributors to Wu EAA. Pathway enrichment analyses of clock-specific CpGs revealed distinct biological architectures, with the Wu clock enriched for neurodevelopmental and immune pathways, and metabolic pathways for the Horvath clock. Discussion Joint prenatal exposure to pesticides and psychosocial factors was associated with increased EAA across early childhood, with psychosocial factors contributing the most to the total effect. These findings highlight the importance of assessing chemical and non-chemical stressors jointly and clock-specific biological interpretation in epigenetic aging research.
Cook, S. H.
Show abstract
Background. Young sexual and gender minorities of color face compound health risks shaped by interlocking systems of racism, cisgenderism, and class inequality. Spatial health research documents that place shapes health, but existing methods cannot specify the mechanisms through which spatial configurations produce different health outcomes for differently positioned people. This gap prevents targeted intervention. ObjectiveTo develop and pilot test the Spatial Intersectionality Health Framework (SIHF), which specifies three mechanisms through which space produces intersectional health inequities: Layered (multiple oppressive systems activating simultaneously), Positional (the same space producing different health pathways by intersectional position), and Conditional (nominally protective spaces carrying hidden costs for specific positions). We also introduce and validate Intersectional Geographically-Explicit Ecological Momentary Assessment (IGEMA) as the methodology operationalizing SIHF across three data levels. MethodsThe GeoSense study enrolled 32 young sexual and gender minorities of color (ages 18-29) in New York City. IGEMA was implemented across three integrated levels: (1) GPS mobility tracking via participants personal smartphones, linked to census tract structural exposure indices across n=19 participants; (2) ecological momentary assessment of intersectional discrimination with multilevel modeling of mood, stress, and sleep outcomes; and (3) map-guided qualitative interviews with SIHF mechanism coding and intercoder reliability assessment across 92 coded records from 18 participants. This study was conducted as the pilot for NIH R01HL169503. ResultsAll three SIHF mechanisms were empirically detectable. A compound structural gendered racism index outperformed every single-axis alternative in predicting daily mood (b=-0.048, p=.001) and stress (b=0.121, p<.001). The Positional mechanism accounted for 71% of coded harm experiences. Intercoder reliability for mechanism assignment reached kappa=0.824 at Stage 2 reconciliation. Daily intersectional discrimination predicted greater sleep disturbance (b=1.308, p=.004). ConclusionsSIHF and IGEMA together provide an empirically testable framework for specifying how space produces intersectional health inequities. Mechanism specification, not spatial location alone, is the condition for designing research and intervention that reaches the source of harm for multiply marginalized populations.
Blackburn, A.
Show abstract
Introduction: The Alcohol Use Disorders Identification Test-Consumption (AUDIT-C) is a widely utilized screening tool in large-scale electronic health record (EHR) biobanks. However, its categorical, range-based survey responses present a significant challenge for epidemiological research, especially where continuous quantitative variables may be preferred. Standard workarounds, such as assigning categorical midpoints or utilizing aggregate ordinal scores for regression mapping often introduce false mathematical precision or obscure critical behavioral nuances between drinking frequency and quantity. This report presents a novel framework for presenting and bounding categorical alcohol survey data. Materials and Methods: I developed two complementary descriptive techniques: (1) a two-dimensional cross-tabulation matrix that preserves the interaction between drinking frequency and typical quantity, and (2) a systematic bounding algorithm that applies time-interval correction factors to calculate strict lower and upper estimates of average daily alcohol consumption. To demonstrate the real-world utility of this framework, I applied these methods to three analytical descriptive scenarios within a European ancestry (EUR) cohort of the All of Us Research Program: Generalized Anxiety Disorder (GAD) prevalence (n=104,893), minor allele frequency (MAF) for the rs1229984 genetic variant (n=104,890), and self-reported active duty military service history (n=104,893). Results: Application of the cross-tabulation matrix revealed patterns across all three descriptive scenarios. For example, participants reporting the highest frequency ("4 or more times a week") combined with the highest quantity ("10 or More" drinks) demonstrated a GAD prevalence of 13.5%, compared to 5.8% among those reporting the same frequency but a low quantity ("1 or 2" drinks). A general trend of increased anxiety in higher quantity drinkers contrasts with a general trend of decreased anxiety in higher frequency drinkers. Bounding estimates for average daily consumption ranged from 0.299 to 0.730 drinks for individuals with GAD, and 0.303 to 0.787 for those without. Those who reported having been active duty in the US Armed Forces demonstrated a general trend toward more frequent drinking and higher average daily consumption estimates (0.339 to 0.875) than those who had not (0.297 to 0.770). The minor allele of the genetic variant rs1229984 exhibited a clear effect reducing both frequency and quantity, resulting in lower average daily consumption estimates. Conclusions: This bounding and mapping framework provides researchers with an additional method to traditional midpoint and aggregate scoring methods. By explicitly defining the uncertainty inherent in categorical survey instruments and visualizing cohort distributions across intersecting behavioral axes, this methodology improves the resolution, reproducibility, and interpretability of lifestyle exposure data.
Stevenson, M.; Reisner, S.; Pontes, C.; Linton, S.; Borquez, A.; Radix, A.; Schneider, J.; Cooney, E.; Wirtz, A.; ENCORE Study Group,
Show abstract
Transgender women are routinely recruited for HIV prevention research and describe feeling over-researched, undervalued, and disconnected from the benefits of research. Research fatigue refers to the adverse impacts of research participation from the volume, frequency, or intensity of research engagement. Research beneficence, an underdeveloped construct, refers to perceptions that research participation is empowering, appreciated, and beneficial to individuals and communities. This study sought to develop and psychometrically evaluate a research fatigue and beneficence scale and examine associations with cohort retention and study procedures among transgender women in the US and Puerto Rico. We developed a novel 7-item measure of research fatigue and beneficence informed by prior literature and qualitative work with transgender women. We assessed internal consistency reliability, factor structure, convergent and divergent validity, and predictive validity with 6-month study retention outcomes and procedures among 2189 transgender women enrolled in a US nationwide cohort (April 2023-December 2024) for the full 7-item research fatigue and beneficence scale, a 4-item research beneficence subscale, and a single-item research fatigue measure. Research beneficence items demonstrated good internal consistency (0.78) and excellent model fit. Research fatigue and beneficence varied by race/ethnicity with participants of color reporting both greater empowerment and greater concerns about community-level benefits. The item "I feel that I am asked to participate in research too frequently" was associated with lower 6-month retention, greater survey missingness, and preference for less invasive HIV testing modalities. Findings highlight multiple dimensions of research experience and the need for reduced participant burden, culturally tailored study designs, and intentional dissemination efforts to improve participant-centered research practices.
McCormick, K. M.
Show abstract
Objectives. To test whether the association between household income and tooth retention differs by race/ethnicity and whether this interaction varies by reason for the most recent dental visit among US adults. Methods. We analyzed 13,190 adults in the National Health and Nutrition Examination Survey (2009 to 2018). Survey weighted linear regression estimated interactions between household income and race/ethnicity in models of tooth retention, stratified by reason for last dental visit. Results. Higher income was associated with greater tooth retention across groups, but income related gains were larger for Non-Hispanic White adults than for Non Hispanic Black and Mexican American adults, particularly in problem-focused care settings. In problem focused visits, each higher income category was associated with 0.5 additional teeth among White adults (95% CI 0.4, 0.6) versus 0.2 (95% CI 0.0, 0.4) among Black adults and 0.1 (95% CI 0.1, 0.3) among Mexican American adults. Racial differences were attenuated in routine check-up contexts. Conclusions. Income related gains in tooth retention differed by race/ethnicity and dental care context. Public Health Implications. Expanding access alone may be insufficient to reduce racial inequities in oral health.
Traeholt, J.; Didriksen, M.; Helenius, D.; Christoffersen, L. A. N.; Dinh, K. M.; Dowsett, J.; Mikkelsen, C.; Hindhede, L.; Quinn, L. J. E.; Bruun, M. T.; Aagaard, B.; Hansen, T. F.; Hjalgrim, H.; Rostgaard, K.; Sorensen, E.; Erikstrup, C.; Pedersen, O. B. V.; Hansen, T.; Schork, A. J.; Markussen, B.; Ostrowski, S. R.
Show abstract
Selective participation in biobanks often compromises inference to the general population, particularly when selection occurs across multiple stages, whether at recruitment or during subsequent participation. Inverse probability (IP) weighting can reduce systematic differences using suitable external benchmarks, but most applications assume a single selection process. Here, we present a multi-stage IP-weighting framework and apply it to the Danish Blood Donor Study (DBDS), a nationwide biobank embedded in Denmark's blood-donation infrastructure. Using national registers, we estimated year-specific probabilities of (i) donation activity and (ii) DBDS enrolment conditional on donation activity, yielding two-stage inclusion weights for 169,893 participants. These weights reduced inclusion-associated imbalance across the 52 auxiliary variables in the probability models by 97.6% (median) and, despite strong health selection under donation-based recruitment, reduced relative-prevalence discrepancies across held-out prescription phenotypes by 69.7% (median). The effective sample size after weighting was 30,627 (18.0% of 169,893). Combining the inclusion weights with questionnaire-specific response weights across five DBDS questionnaires (>500 questions) produced the largest changes from unweighted to weighted responses for health behaviours and symptom severity, including tobacco and alcohol consumption, menstrual-pain severity, restless-legs severity, nocturia, sleep disturbance, and fatigue. These findings support multi-stage IP-weighting to improve population alignment in biobanks with staged selection.
Liu, Y. E.; Li, B.; Warren, J. L.; Gonsalves, G. S.; Wang, E. A.
Show abstract
Decarceration, the process of reducing incarceration rates, is increasingly viewed as a strategy to improve population health and reduce health inequities. Yet, evidence on its health effects remains limited and may depend on how decarceration occurs. We developed a national decarceration "atlas" to characterize the mechanisms and dynamics of decarceration across more than 2,800 U.S. counties between 1999-2019. Using longitudinal county-level jail and prison data, we identified four operational types of decarceration: reduced pretrial detention, reduced jail time, reduced prison admissions, and reduced prison time. Nearly two-thirds of counties, including most rural counties, experienced at least one decarceration type during the study period. Declines typically followed periods of recent growth and were relatively modest in magnitude, with median reductions of 19% to 38% ten years after onset. The frequency and timing of decarceration types varied by urbanicity, state, and region, with many counties experiencing multiple mechanisms concurrently. Validation against documented case studies of state and local decarceration demonstrated alignment with known legislative and de facto drivers, while revealing substantial sub-state heterogeneity. This atlas provides a scalable framework and hypothesis-generating resource to support comparative studies of decarceration's heterogeneous health effects.
Hernandez, M. A.; Kwong, A. S.; Li, C.; Simpkin, A. J.; Wootton, R. E.; Joinson, C.; Elhakeem, A.
Show abstract
Understanding depressive symptoms dynamics and their determinants is crucial for designing effective mental health support initiatives. This study compared two methods for describing youth depressive symptoms trajectories and investigated associations of early-life factors (maternal education, maternal perinatal depression, domestic violence, physical, emotional, or sexual abuse, bullying victimisation, psychiatric disorder) with trajectory features. Prospective data from 8,264 mostly White European participants (54% female), including self-reported Short Moods and Feelings Questionnaires on ten occasions between 10-25 years, were used. Trajectories were summarised using functional principal component analysis (FPCA) and P-splines linear mixed-effect (PLME) models. Estimated derivatives were used to obtain magnitude and age of peak symptoms and peak symptoms velocity. Both methods performed comparably, but PLME models tended to over-smooth trajectories. Peak symptoms and peak velocity were higher and occurred >1 year earlier in females than males. All early-life factors were associated with higher peak symptoms, and most associated with higher and earlier peak velocity. Abuse and bullying additionally associated with earlier age of peak symptoms. FPCA is a useful alternative for characterising depressive symptoms trajectories and informing time-sensitive preventative measures to reduce impact of depression before symptoms reach their peak. Early-life stressors may accelerate timeline and intensity of symptoms escalation during adolescence. Lay summaryUnderstanding development of depressive symptoms and factors shaping them is crucial for designing effective mental health support initiatives. This study used data from over 8,000 young people regularly followed up from before birth to compare two cutting-edge methods for describing depressive symptoms trajectories and examined how known risk factors for adulthood depression relate to the severity and rate of change of depressive symptoms in adolescence. We found that both methods performed well and that the peaks in depressive symptoms and their rate of change were, on average, higher and occurred over a year earlier in females than males. Our findings additionally suggest that early-life stressors (e.g., abuse, bullying) may accelerate the development of depression, highlighting the importance of early prevention.
Kamulegeya, R.; Nabatanzi, R.; Semugenze, D.; Mugala, F.; Takuwa, M.; Nasinghe, E.; Musinguzi, D.; Namiiro, S.; Katumba, A.; Ssengooba, W.; Nakatumba-Nabende, J.; Kivunike, F. N.; Kateete, D. P.
Show abstract
BackgroundTuberculosis (TB) remains a leading cause of infectious disease mortality worldwide, and treatment failure contributes to ongoing transmission, drug resistance, and poor clinical outcomes. Artificial intelligence and machine learning approaches have attracted growing interest for predicting tuberculosis treatment outcomes, but the literature is heterogeneous and lacks a comprehensive synthesis. MethodsWe conducted a systematic review and meta-analysis of studies that developed or validated machine learning models to predict TB treatment failure. We searched PubMed/MEDLINE and Embase from January 2000 to October 2025. Studies were eligible if they developed, validated, or implemented an artificial intelligence or machine learning model for the prediction of TB treatment failure or a closely related poor outcome in patients receiving anti-TB treatment. Risk of bias was assessed using the Prediction model Risk Of Bias Assessment Tool. Random-effects meta-analysis was performed to pool area under the curve values, with subgroup analyses and meta-regression to explore heterogeneity. ResultsThirty-four studies were included in the systematic review, of which 19 reported area under the curve values suitable for meta-analysis (total participants, 100,790). Studies were published between 2014 and 2025, with 91% published from 2019 onward. Tree-based methods were the most common algorithm family (52.9%), and multimodal models integrating three or more data types were used in 41.2% of studies. The pooled area under the curve was 0.836 (95% confidence interval 0.799-0.868), with substantial heterogeneity (I{superscript 2} = 97.9%). In subgroup analyses, studies including HIV-positive participants showed lower discrimination (pooled area under the curve 0.748) compared to those excluding them (0.924). Only eight studies (23.5%) performed external validation, and only one study (2.9%) was rated as low risk of bias overall, primarily due to methodological concerns in the analysis domain. Eggers test suggested publication bias (p = 0.024). Major evidence gaps included underrepresentation of high-burden countries, HIV-affected populations, social determinants, pediatric TB, and extrapulmonary disease. ConclusionsMachine learning models for predicting TB treatment failure show promising discrimination but are not yet ready for routine clinical implementation. Performance varies substantially across populations and settings, and methodological limitations, including inadequate validation, poor calibration assessment, and high risk of bias, limit confidence in current estimates. Future research should prioritize rigorous external validation, calibration assessment, and development in underrepresented populations, particularly HIV-affected and high-burden settings. Author SummaryTB kills over a million people annually. While curable, treatment failure remains common and drives ongoing transmission and drug resistance. Researchers increasingly use artificial intelligence and machine learning to predict which patients will fail treatment, but it is unclear if these models are ready for clinical use. We reviewed 34 studies including nearly 1.1 million participants from 22 countries. On average, models correctly distinguished patients who would fail treatment from those who would not 84% of the time, a performance generally considered good. However, this average hid enormous variation. Models developed in populations including HIV-positive people performed substantially worse, suggesting prediction is harder with HIV co-infection. Worryingly, only one study used high-quality methods; 97% had serious flaws in handling missing data, checking calibration, or testing in new populations. Only eight studies validated their models in different settings. To conclude, we found that machine learning is promising in predicting TB treatment failure, but it is not ready for clinical use. Researchers should prioritize validation in high-burden settings, include social determinants, and improve methodological rigor before these tools can help patients.
Moon, J.-Y.; Filigrana, P.; Gallo, L. C.; Perreira, K. M.; Cai, J.; Daviglus, M.; Fernandez-Rhodes, L. E.; Garcia-Bedoya, O.; Qi, Q.; Thyagarajan, B.; Tarraf, W.; Wang, T.; Kaplan, R.; Isasi, C. R.
Show abstract
Childhood socioeconomic position (SEP) can have lifelong effects on health. Many studies have used adult height as a surrogate marker for early-life conditions. In this study, we derived the non-genetic component of height, calculated as the residual from sex-specific standardized height regressed on genetically predicted height, as a surrogate for childhood SEP, using data from the Hispanic Community Healthy Study/Study of Latinos (2008-2011). A positive residual would indicate favorable early-life conditions promoting growth, while a negative residual indicates early-life adversity that may stunt the development. The height residual was associated with early-life variables such as parental education, year of birth, US nativity and age at first migration to the US (50 states/DC), supporting the validity of height residual as a surrogate for early-life conditions. Furthermore, a height residual was positively associated with better cardiovascular health (CVH) and cognitive function among middle-aged and older adults. Interestingly, among <35 years old, the height residual was negatively associated with the "Lifes Essential 8" clinical CVH scores. These results suggest the non-genetic component of height as a surrogate for childhood environment, with predictive value for CVH and cognitive function.
RAZAFIMAHATRATRA, S. L.; RASOLOHARIMANANA, L. T.; ANDRIAMARO, T. M.; RANAIVOMANANA, P.; SCHOENHALS, M.
Show abstract
Interpreting serological data remains challenging, particularly in low prevalence or cross reactive contexts, where antibody responses often show substantial overlap between exposed and unexposed individuals and may depart from normal distributional assumptions. Conventional cutoff based approaches often yield inconsistent or biased estimates of seroprevalence. Here, we present a decisional framework based on finite mixture models (FMMs) that enhances the robustness and interpretability of serological analyses. Beyond simply applying mixture models, our framework integrates multiple methodological innovations : (i) systematic comparison of Gaussian and skew normal mixture models to accommodate asymmetric antibody distributions; (ii) rigorous model selection using the Cramer von Mises test (p > 0.01) combined with a parsimonious score (APS) to prioritize models with well separated clusters; and (iii) hierarchical clustering of posterior probabilities to collapse latent components into biologically meaningful seronegative and seropositive groups. Applied to chikungunya virus (CHIKV) data from Bangladesh, the framework produced prevalence estimates consistent with ROC based methods while probabilistically identifying borderline cases. Validation on SARS CoV 2 and dengue datasets further demonstrated its generalizability: for SARS CoV 2, the approach identified up to five latent clusters with high sensitivity (up to 100%) and specificity (up to 100%), enabling discrimination by disease severity. For dengue, it revealed interpretable subgrouping consistent with background exposure and subclinical infection, despite limited confirmed cases. By integrating distributional flexibility, robust goodness of fit testing, and biologically guided cluster consolidation, this decisional FMM framework provides a reproducible and scalable method for serological interpretation across pathogens and epidemiological settings, addressing key limitations of threshold based classification.
Nande, A.; Larsen, S. L.; Turtle, J.; Davis, J. T.; Bandekar, S. R.; Lewis, B.; Chen, S.; Contamin, L.; Jung, S.-m.; Howerton, E.; Shea, K.; Bay, C.; Ben-Nun, M.; Bi, K.; Bouchnita, A.; Chen, J.; Chinazzi, M.; Fox, S. J.; Hill, A. L.; Hochheiser, H.; Lemaitre, J. C.; Loo, S. L.; Marathe, M.; Meyers, L. A.; Pearson, C. A. B.; Porebski, P.; Przykucki, E.; Smith, C. P.; Venkatramanan, S.; Vespignani, A.; Willard, T. C.; Yan, K.; Viboud, C.; Lessler, J.; Truelove, S.
Show abstract
Background Six years after its emergence, SARS-CoV-2 continues to have a substantial burden. The impact of vaccination and the optimal timing of its rollout remain uncertain given existing population immunity and variability in outbreak timing between summer and winter. Methods The US Scenario Modeling Hub convened its 19th round of ensemble projections for COVID-19 hospitalizations and deaths in the United States, where eight teams projected trajectories in each US state and nationally from April 2025 to April 2026 under five scenarios regarding vaccine recommendations and timing. Recommendations had two eligibility scenarios (high-risk individuals only and all-eligible) and two timing scenarios (classic start: mid-August, earlier start: late June). These were crossed to create four scenarios and were compared against a counterfactual scenario with no vaccination. Findings Compared to no vaccination, our ensemble projections estimated 90,000 (95% PI 53,000-126,000) hospitalizations averted in the high-risk and classic timing scenario across the US. Expanding to all-eligible age-groups averted an additional 26,000 (95% PI 14,000-39,000) hospitalizations, which when coupled with the early vaccination timing, was projected to further reduce national hospitalizations by 15,000 (95% PI -3,000-33,000). The majority of teams projected both summer and winter waves. Implications We project COVID-19 will cause significant hospitalizations and deaths in the US in the 2025-26 season and estimate significant benefits from a broad all-eligible vaccination recommendation. The results also suggest an additional benefit is likely to be gained from an earlier vaccination campaign. Funding Centers for Disease Control and Prevention; National Institute of Health (US), National Science Foundation (US)
O'Connor, M.; O'Connor, E.; Hughes, E. K.; Bann, D.; Knight, K.; Tabor, E.; Bridger-Staatz, C.; Gray, S.; Burgner, D.; Olsson, C. A.
Show abstract
Background: Population-based cohort studies are increasingly expected to demonstrate benefits for public health and wider society. However, there is limited systematic evidence on what such impact entails or how it is generated and sustained. To address this gap, we examined researcher perspectives on the impact of cohort studies. Methods: We conducted, to our knowledge, the first quantitative study of researcher views on cohort impact, recruiting active cohort researchers through national and international networks between August and December 2025. The anonymous cross-sectional survey captured researcher characteristics, perceived contributions, impact processes, challenges, and open-ended reflections. Results: A total of 163 cohort researchers participated, primarily from Australia (42%) and the UK (23%). Participants perceived their work as informing a wide range of societal issues and reported investing an average of 24% of their work time in impact-related activities. While most respondents (73%) believed their research leads to tangible policy or practice change, two thirds indicated that impact is rarely or never demonstrable shortly after study completion (67%) and seldom attributable to a single study (67%). Key concerns included pressure to overstate contributions (80%), perceived disadvantages for cohort studies in impact assessments (78%), and inadequate skills or resources to achieve impact (65%). Conclusions: Cohort researchers perceive their work as generating broad societal contributions and invest substantial effort in supporting impact. However, they face systemic challenges in both achieving and demonstrating impact. These findings highlight the need for impact frameworks that better capture complexity, long-term influence, and cumulative contributions, while mitigating unintended consequences.
Danon, L.; Brooks-Pollock, E.
Show abstract
Background Social contact surveys, which measure who-contacts-whom, are widely used to inform infectious disease transmission models and estimate the reproduction number (R), a key metric for assessing epidemic risk. Despite their widespread use, sample size calculations are not routinely performed. Aims To assess the impact of sample size on estimates of R and determine a practical target sample size for social contact surveys used in epidemic modelling. Methods We conducted a review of social contact surveys (2008-2025) to characterise current practice. We characterised the impact of survey size on epidemic metrics using two social contact surveys, the UK Social Contact Survey and POLYMOD (Europe) and two methods. For each dataset and approach, we generated repeated subsamples and calculated the resulting reproduction numbers, characterised their distributions and measured uncertainty. Results We identified 107 unique social contact surveys from 57 studies. Sample sizes ranged from 30 to more than 10,000 participants, with a median of 1,438. One quarter of surveys contained fewer than 1,000 participants. From our simulations, we find that sample sizes below 200 individuals can result in highly variability reproduction numbers. Increasing sample size increases precision, and the most meaningful gains are up to 1,300 individuals. Increasing sample sizes over 3,000 individuals leads to smaller gains. Conclusions A minimum sample size of approximately 1,200-1,300 participants appears sufficient for general-purpose use. These findings support the inclusion of sample size considerations in the design, reporting and interpretation of social contact surveys used for epidemic intelligence and public health decision-making.